Unsupervised Anomaly Detection

نویسندگان

  • David Guthrie
  • Louise Guthrie
  • Ben Allison
  • Yorick Wilks
چکیده

This paper describes work on the detection of anomalous material in text. We show several variants of an automatic technique for identifying an 'unusual' segment within a document, and consider texts which are unusual because of author, genre [Biber, 1998], topic or emotional tone. We evaluate the technique using many experiments over large document collections, created to contain randomly inserted anomalous segments. In order to successfully identify anomalies in text, we define more than 200 stylistic features to characterize writing, some of which are well-established stylistic determiners, but many of which are novel. Using these features with each of our methods, we examine the effect of segment size on our ability to detect anomaly, allowing segments of size 100 words, 500 words and 1000 words. We show substantial improvements over a baseline in all cases for all methods, and identify the method variant which performs consistently better than others.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

360◦ Anomaly Based Unsupervised Intrusion Detection

This paper is meant as a reference to describe the research conducted at the Politecnico di Milano university on unsupervised learning for anomaly detection. We summarize our key results and our ongoing and future work, referencing our publications as well as the core literature of the field to give the interested reader a roadmap for exploring our research area.

متن کامل

A Probabilistic Approach to Aggregating Anomalies for Unsupervised Anomaly Detection with Industrial Applications

This paper presents a novel, unsupervised approach to detecting anomalies at the collective level. The method probabilistically aggregates the contribution of the individual anomalies in order to detect significantly anomalous groups of cases. The approach is unsupervised in that as only input, it uses a list of cases ranked according to its individual anomaly score. Thus, any anomaly detection...

متن کامل

A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data.

Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for exampl...

متن کامل

Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters

Most current network intrusion detection systems employ signature-based methods or data mining-based methods which rely on labeled training data. This training data is 90 typically expensive to produce. Moreover, these methods have difficulty in detecting new types of attack. In this paper, we have discussed anomaly based instruction detection, pros and cons of anomaly detection, supervised and...

متن کامل

Berkay Kicanaoglu Unsupervised Anomaly Detection in Unstructured Log-data for Root-cause-analysis

BERKAY KICANAOGLU: Unsupervised Anomaly Detection in unstructured log-data for root-cause-analysis Tampere University of Technology Master's Thesis, 64 pages, 0 Appendix pages April 2015 Master's Degree Programme in Information Technology Major: Signal Processing Examiner: Prof. Moncef Gabbouj

متن کامل

Anomaly Intrusion Detection Design Using Hybrid of Unsupervised and Supervised Neural Network

This paper proposed a new approach to design the system using a hybrid of misuse and anomaly detection for training of normal and attack packets respectively. The utilized method for attack training is the combination of unsupervised and supervised Neural Network (NN) for Intrusion Detection System. By the unsupervised NN based on Self Organizing Map (SOM), attacks will be classified into small...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007